144 research outputs found

    Improving Entity Linking by Modeling Latent Entity Type Information

    Full text link
    Existing state of the art neural entity linking models employ attention-based bag-of-words context model and pre-trained entity embeddings bootstrapped from word embeddings to assess topic level context compatibility. However, the latent entity type information in the immediate context of the mention is neglected, which causes the models often link mentions to incorrect entities with incorrect type. To tackle this problem, we propose to inject latent entity type information into the entity embeddings based on pre-trained BERT. In addition, we integrate a BERT-based entity similarity score into the local context model of a state-of-the-art model to better capture latent entity type information. Our model significantly outperforms the state-of-the-art entity linking models on standard benchmark (AIDA-CoNLL). Detailed experiment analysis demonstrates that our model corrects most of the type errors produced by the direct baseline.Comment: Accepted by AAAI 202

    A new result on observer-based sliding mode control design for a class of uncertain Ito^ stochastic delay systems

    Get PDF
    © 2017 The Franklin Institute This paper develops a new observer-based sliding mode control (SMC) scheme for a general class of Ito^ stochastic delay systems (SDS). The key merit of the presented scheme lies in its simplicity and integrity in design process of the traditional sliding mode observer (SMO) strategy, i.e., the state observer and sliding surface design as well as the associated sliding mode controller synthesis. For guaranteeing to use the scheme, a new LMIs-based criterion is established to ensure the exponential stability of the underlying sliding mode dynamics (SMDs) in mean-square sense with H∞ performance. A bench test example is provided to numerically demonstrate the efficacy of the scheme and illustrate the application procedure for potential readers/users with interest in their ad hoc applications and methodology expansion

    TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design

    Full text link
    Text design is one of the most critical procedures in poster design, as it relies heavily on the creativity and expertise of humans to design text images considering the visual harmony and text-semantic. This study introduces TextPainter, a novel multimodal approach that leverages contextual visual information and corresponding text semantics to generate text images. Specifically, TextPainter takes the global-local background image as a hint of style and guides the text image generation with visual harmony. Furthermore, we leverage the language model and introduce a text comprehension module to achieve both sentence-level and word-level style variations. Besides, we construct the PosterT80K dataset, consisting of about 80K posters annotated with sentence-level bounding boxes and text contents. We hope this dataset will pave the way for further research on multimodal text image generation. Extensive quantitative and qualitative experiments demonstrate that TextPainter can generate visually-and-semantically-harmonious text images for posters.Comment: Accepted to ACM MM 2023. Dataset Link: https://tianchi.aliyun.com/dataset/16003

    Too Large; Data Reduction for Vision-Language Pre-Training

    Full text link
    This paper examines the problems of severe image-text misalignment and high redundancy in the widely-used large-scale Vision-Language Pre-Training (VLP) datasets. To address these issues, we propose an efficient and straightforward Vision-Language learning algorithm called TL;DR, which aims to compress the existing large VLP data into a small, high-quality set. Our approach consists of two major steps. First, a codebook-based encoder-decoder captioner is developed to select representative samples. Second, a new caption is generated to complement the original captions for selected samples, mitigating the text-image misalignment problem while maintaining uniqueness. As the result, TL;DR enables us to reduce the large dataset into a small set of high-quality data, which can serve as an alternative pre-training dataset. This algorithm significantly speeds up the time-consuming pretraining process. Specifically, TL;DR can compress the mainstream VLP datasets at a high ratio, e.g., reduce well-cleaned CC3M dataset from 2.82M to 0.67M (∼\sim24\%) and noisy YFCC15M from 15M to 2.5M (∼\sim16.7\%). Extensive experiments with three popular VLP models over seven downstream tasks show that VLP model trained on the compressed dataset provided by TL;DR can perform similar or even better results compared with training on the full-scale dataset. The code will be made available at \url{https://github.com/showlab/data-centric.vlp}.Comment: Work in progress. Code: https://github.com/showlab/data-centric.vl

    Genome-Wide DNA Methylation Profiling in Human Breast Tissue by Illumina TruSeq Methyl Capture EPIC Sequencing and Infinium MethylationEPIC Beadchip Microarray

    Get PDF
    A newly-developed platform, the Illumina TruSeq Methyl Capture EPIC library prep (TruSeq EPIC), builds on the content of the Infinium MethylationEPIC Beadchip Microarray (EPIC-array) and leverages the power of next-generation sequencing for targeted bisulphite sequencing. We empirically examined the performance of TruSeq EPIC and EPIC-array in assessing genome-wide DNA methylation in breast tissue samples. TruSeq EPIC provided data with a much higher density in the regions when compared to EPIC-array (~2.74 million CpGs with at least 10X coverage vs ~752 K CpGs, respectively). Approximately 398 K CpGs were common and measured across the two platforms in every sample. Overall, there was high concordance in methylation levels between the two platforms (Pearson correlation r = 0.98, P \u3c 0.0001). However, we observed that TruSeq EPIC measurements provided a wider dynamic range and likely a higher quantitative sensitivity for CpGs that were either hypo- or hyper-methylated (β close to 0 or 1, respectively). In addition, when comparing different breast tissue types TruSeq EPIC identified more differentially methylated CpGs than EPIC-array, not only out of additional sites interrogated by TruSeq EPIC alone, but also out of common sites interrogated by both platforms. Our results suggest that both platforms show high reproducibility and reliability in genome-wide DNA methylation profiling, while TruSeq EPIC had a significant improvement over EPIC-array regarding genomic resolution and coverage. The wider dynamic range and likely higher precision of the estimates by the TruSeq EPIC may lead to the identification of novel differentially methylated markers that are associated with disease risk
    • …
    corecore